DIMACS at the TREC 2005 Genomics Track
نویسندگان
چکیده
This report describes DIMACS work on the text categorization task of the TREC 2005 Genomics track. Our approach to this task was similar to the triage subtask studied in the TREC 2004 Genomics track. We applied Bayesian logistic regression and achieved good effectiveness on all categories. 1. TEXT CATEGORIZATION TASK The Mouse Genome Informatics (MGI) project of the Jackson Laboratory provides data on the genetics, genomics, and biology of the laboratory mouse. In particular, the Mouse Genome Database (MGD) contains information for the mouse system annotated from literature. To find information on mouse genomics biology, MGI first automatically scans new scientific literature for records containing one or more of the words “mouse”, “mice”, and “murine”. In a triage step, MGI personnel then check each article to see if it contains information appropriate for inclusion in MGD. The goal of this triage process is to limit the number of articles sent to human curators for more detailed analysis. The TREC 2005 Genomics track [4] defined a categorization task based on simplified versions of the MGI triage process. It consists of the triage subtask from the TREC 2004 Genomics track [3], which aims to identify articles for Gene Ontology annotation, as well as three other major topics of interest to MGI. This year’s categorization task includes the following four categories: • Alleles of mutant types, • Embryologic gene expresession, • Gene Ontology (from TREC 2004),
منابع مشابه
DIMACS at the TREC 2004 Genomics Track
DIMACS participated in the text categorization and ad hoc retrieval tasks of the TREC 2004 Genomics track. For the categorization task, we tackled the triage and annotation hierarchy subtasks. 1. TEXT CATEGORIZATION TASK The Mouse Genome Informatics (MGI) project of the Jackson Laboratory provides data on the genetics, genomics, and biology of the laboratory mouse. In particular, the Mouse Geno...
متن کاملSymbol-Based Query Expansion Experiments at TREC 2005 Genomics Track
This paper illustrates the activity conducted at the TREC 2005 evaluation campaign in the ad-hoc task of the Genomics track. The retrieval effectiveness of a relevance feedback query expansion algorithm, which is based on symbols, is studied. The experimental results suggest that query expansion based on implicit relevance feedback is not always an effective means for improving effectiveness in...
متن کاملTREC 2005 Genomics Track Experiments at IBM Watson
This paper describes our experiments in the TREC 2005 Genomics Track. For the ad-hoc retrieval task, we study synonym-based query expansion, as well as the effectiveness of a new pseudo-relevance feedback method which is derived from our recent work on semi-supervised learning. For the categorization task, we study various methods for estimating conditional class probability and determining the...
متن کاملIIT TREC 2005: Genomics Track
For the TREC-2005 Genomics Track ad-hoc retrieval task, we report on the development of a scalable information retrieval engine based on a relational data model for the integration of structured data and text. Our objectives are to meet the need for the integrated search of heterogeneous data sets of biomedical literature and structured data found in biological databases, and to demonstrate the...
متن کامل